import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"
For this excercise, we have written the following code to load the stock dataset built into plotly express.
stocks = px.data.stocks()
stocks.head()
| date | GOOG | AAPL | AMZN | FB | NFLX | MSFT | |
|---|---|---|---|---|---|---|---|
| 0 | 2018-01-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 1 | 2018-01-08 | 1.018172 | 1.011943 | 1.061881 | 0.959968 | 1.053526 | 1.015988 |
| 2 | 2018-01-15 | 1.032008 | 1.019771 | 1.053240 | 0.970243 | 1.049860 | 1.020524 |
| 3 | 2018-01-22 | 1.066783 | 0.980057 | 1.140676 | 1.016858 | 1.307681 | 1.066561 |
| 4 | 2018-01-29 | 1.008773 | 0.917143 | 1.163374 | 1.018357 | 1.273537 | 1.040708 |
Select a stock and create a suitable plot for it. Make sure the plot is readable with relevant information, such as date, values.
# axes x and y
x = stocks["date"]
y = stocks["GOOG"]
# initiate plot with title and labels
fig, ax = plt.subplots(1,1,figsize=(10,8))
ax.plot(x,y)
ax.set_title("Google stock")
ax.set_xlabel("date")
ax.set_ylabel("stock value")
# set the ticks on the x axis
xticks = ax.get_xticks()
new_xticks = xticks[0::14]
ax.set_xticks(new_xticks)
# show plot
plt.show
<function matplotlib.pyplot.show(close=None, block=None)>
You've already plot data from one stock. It is possible to plot multiples of them to support comparison.
To highlight different lines, customise line styles, markers, colors and include a legend to the plot.
# axes x and y
x = stocks["date"]
y1 = stocks["GOOG"]
y2 = stocks["AAPL"]
y3 = stocks["AMZN"]
y4 = stocks["FB"]
y5 = stocks["NFLX"]
y6 = stocks["MSFT"]
# initiate plots with title, labels, and legend
fig, ax = plt.subplots(1,1,figsize=(10,8))
ax.plot(x, y1, label = "GOOG")
ax.plot(x, y2, label = "AAPL")
ax.plot(x, y3, label = "AMZN")
ax.plot(x, y4, label = "FB")
ax.plot(x, y5, label = "NFLX")
ax.plot(x, y6, label = "MSFT")
ax.set_title("Stocks")
ax.set_xlabel("date")
ax.set_ylabel("stock value")
ax.legend()
# set the ticks on the x axis
xticks = ax.get_xticks()
new_xticks = xticks[0::14]
ax.set_xticks(new_xticks)
# show plot
plt.show
<function matplotlib.pyplot.show(close=None, block=None)>
First, load the tips dataset
tips = sns.load_dataset('tips')
tips.head()
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
Let's explore this dataset. Pose a question and create a plot that support drawing answers for your question.
Some possible questions:
# Is there a correlation between day of the week and total bills paid?
df_thurs = tips[(tips['day'] == 'Thur')]
df_fri = tips[(tips['day'] == 'Fri')]
df_sat = tips[(tips['day'] == 'Sat')]
df_sun = tips[(tips['day'] == 'Sun')]
fig = plt.figure(figsize=(10,10))
gs = fig.add_gridspec(nrows=4, ncols=1)
ax = fig.add_subplot(gs[0,:])
ax.set_title('Thursday')
x = df_thurs['total_bill']
ax.set_ylabel('Frequency')
ax.set_xlabel('Total Bill')
ax.hist(x, bins=50, rwidth=0.8, color='g')
ax = fig.add_subplot(gs[1,:])
ax.set_title('Friday')
x = df_fri['total_bill']
ax.set_ylabel('Frequency')
ax.set_xlabel('Total Bill')
ax.hist(x, bins=50, rwidth=0.8, color='r')
ax = fig.add_subplot(gs[2,:])
ax.set_title('Saturday')
x = df_sat['total_bill']
ax.set_ylabel('Frequency')
ax.set_xlabel('Total Bill')
ax.hist(x, bins=50, rwidth=0.8, color='m')
ax = fig.add_subplot(gs[3,:])
ax.set_title('Sunday')
x = df_sun['total_bill']
ax.set_ylabel('Frequency')
ax.set_xlabel('Total Bill')
ax.hist(x, bins=50, rwidth=0.8, color = 'y')
fig.tight_layout()
# Joint plot
sns.jointplot(x='total_bill', y='day', data=tips)
plt.show()
Redo the above exercises (challenges 2 & 3) with plotly express. Create diagrams which you can interact with.
Hints:
fig = px.line(stocks, x="date", y=stocks.columns[1:7])
fig.show()
df = px.data.tips()
fig = px.histogram(df, x="total_bill", color = "day")
fig.show()
Recreate the barplot below that shows the population of different continents for the year 2007.
Hints:
#load data
df = px.data.gapminder()
df.head()
| country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | AFG | 4 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | AFG | 4 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | AFG | 4 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | AFG | 4 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | AFG | 4 |
# load 2007 data
df_2007 = df.query('year==2007')
# Sum up population by continent
df_sum = df_2007.groupby('continent').sum()
# Sort data into ascending order
df_sorted = df_sum.sort_values(by='pop',ascending=True)
# Create bar chart
fig = px.bar(df_sorted, x="pop", y=df_sum.index,
orientation='h', color = df_sum.index,
labels={'y':'continent','pop':'population'}, text = 'pop')
fig.show()